<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>

<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->

<head>
    <title>HBase</title>
</head>
<body bgcolor="white">
<a href="http://hbase.org">HBase</a> is a scalable, distributed database built on <a href="http://hadoop.apache.org/core">Hadoop Core</a>.

<h2><a name="requirements">Requirements</a></h2>
<ul>
  <li>Java 1.5.x, preferably from <a href="http://www.java.com/en/download/">Sun</a>.
  </li>
  <li>Hadoop 0.16.x.  This version of HBase will only run on Hadoop 0.16.x.</a>.
  </li>
  <li>
    ssh must be installed and sshd must be running to use Hadoop's
    scripts to manage remote Hadoop daemons.
  </li>
  <li>HBase currently is a file handle hog.  The usual default of
  1024 on *nix systems is insufficient if you are loading any significant
  amount of data into regionservers.  See the
  <a href="http://wiki.apache.org/hadoop/Hbase/FAQ#6">FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs?</a>
  for how to up the limit.</li>
</ul>

<h2><a name="getting_started" >Getting Started</a></h2>
<p>
What follows presumes you have obtained a copy of HBase and are installing
for the first time. If upgrading your
HBase instance, see <a href="#upgrading">Upgrading</a>.
</p>
<p>
<ul>
<li><code>${HBASE_HOME}</code>: Set HBASE_HOME to the location of the HBase root: e.g. <code>/user/local/hbase</code>.
</li>
</ul>
</p>
<p>Edit <code>${HBASE_HOME}/conf/hbase-env.sh</code>.  In this file you can
set the heapsize for HBase, etc.  At a minimum, set <code>JAVA_HOME</code> to point at the root of
your Java installation.
<p>
If you are running a standalone operation, there should be nothing further to configure; proceed to
<a href=#runandconfirm>Running and Confirming Your Installation</a>.  If you are running a distributed
operation, continue reading.
</p>

<h2><a name="distributed" >Distributed Operation</a></h2>
<p>Distributed mode requires an instance of the Hadoop Distributed File System (DFS).
See the Hadoop <a href="http://lucene.apache.org/hadoop/api/overview-summary.html#overview_description">
requirements and instructions</a> for how to set up a DFS.</p> 
<p>Once you have confirmed your DFS setup, configuring HBase requires modification of the following two files:
<code>${HBASE_HOME}/conf/hbase-site.xml</code> and <code>${HBASE_HOME}/conf/regionservers</code>.  
The former needs to be pointed at the running Hadoop DFS instance.  The latter file lists
all the members of the HBase cluster.
</p>
<p>Use <code>hbase-site.xml</code> to override the properties defined in 
<code>${HBASE_HOME}/conf/hbase-default.xml</code> (<code>hbase-default.xml</code> itself 
should never be modified).  At a minimum the <code>hbase.master</code> and the 
<code>hbase.rootdir</code> properties should be redefined 
in <code>hbase-site.xml</code> to configure the <code>host:port</code> pair on which the 
HMaster runs (<a href="http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture">read about the 
HBase master, regionservers, etc</a>) and to point HBase at the Hadoop filesystem to use.  For
example, adding the below to your hbase-site.xml says the master is up on port 60000 on the host
example.org and that HBase should use the <code>/hbase</code> directory in the HDFS whose namenode
is at port 9000, again on example.org:
</p>
<pre>
&lt;configuration&gt;

  &lt;property&gt;
    &lt;name&gt;hbase.master&lt;/name&gt;
    &lt;value&gt;example.org:60000&lt;/value&gt;
    &lt;description&gt;The host and port that the HBase master runs at.
    &lt;/description&gt;
  &lt;/property&gt;

  &lt;property&gt;
    &lt;name&gt;hbase.rootdir&lt;/name&gt;
    &lt;value&gt;hdfs://example.org:9000/hbase&lt;/value&gt;
    &lt;description&gt;The directory shared by region servers.
    &lt;/description&gt;
  &lt;/property&gt;

&lt;/configuration&gt;
</pre>
<p>
The <code>regionserver</code> file lists all the hosts running HRegionServers, one 
host per line  (This file is HBase synonym of the hadoop slaves file at 
<code>${HADOOP_HOME}/conf/slaves</code>).
</p>
<p>Of note, if you have made <i>HDFS client configuration</i> on your hadoop cluster, hbase will not
see this configuration unless you do one of the following:
<ul>
    <li>Add a pointer to your <code>HADOOP_CONF_DIR</code> to <code>CLASSPATH</code> in <code>hbase-env.sh</code></li>
    <li>Add a copy of <code>hadoop-site.xml</code> to <code>${HBASE_HOME}/conf</code>, or</li>
    <li>If only a small set of HDFS client configurations, add them to <code>hbase-site.xml</code></li>
</ul>
An example of such an HDFS client configuration is <code>dfs.replication</code>.  If for example,
you want to run with a replication factor of 5, hbase will make files will create files with
the default of 3 unless you do the above to make the configuration available to hbase.
</p>

<h2><a name="runandconfirm">Running and Confirming Your Installation</a></h2>
<p>If you are running in standalone, non-distributed mode, HBase by default uses
the local filesystem.</p>

<p>If you are running a distributed cluster you will need to start the Hadoop DFS daemons 
before starting HBase and stop the daemons after HBase has shut down.  Start and 
stop the Hadoop DFS daemons by running <code>${HADOOP_HOME}/bin/start-dfs.sh</code>.
Ensure it started properly by testing the put and get of files into the Hadoop filesystem.
HBase does not normally use the mapreduce daemons.  These do not need to be started.</p>

<p>Start HBase with the following command:
</p>
<pre>
${HBASE_HOME}/bin/start-hbase.sh
</pre>
<p>
Once HBase has started, enter <code>${HBASE_HOME}/bin/hbase shell</code> to obtain a 
shell against HBase from which you can execute HQL commands (HQL is a severe subset of SQL).  
In the HBase shell, type <code>help;</code> to see a list of supported HQL commands.  Note
that all commands in the HBase 
shell must end with <code>;</code>.  Test your installation by creating, viewing, and dropping 
a table, as per the help instructions.  Be patient with the <code>create</code> and 
<code>drop</code> operations as they may each take 10 seconds or more.  To stop HBase, exit the 
HBase shell and enter:
</p>
<pre>
${HBASE_HOME}/bin/stop-hbase.sh
</pre>
<p>
If you are running a distributed operation, be sure to wait until HBase has shut down completely 
before stopping the Hadoop daemons.
</p>
<p>
The default location for logs is <code>${HBASE_HOME}/logs</code>.
</p>
<p>HBase also puts up a UI listing vital attributes.  By default its deployed on the master host
at port 60010 (HBase regionservers listen on port 60020 by default and put up an informational
http server at 60030).</p>

<h2><a name="upgrading" >Upgrading</a></h2>
<p>After installing a new HBase on top of data written by a previous HBase version, before
starting your cluster, run the <code>${HBASE_DIR}/bin/hbase migrate</code> migration script.
It will make any adjustments to the filesystem data under <code>hbase.rootdir</code> necessary to run
the HBase version. It does not change your install unless you explicitly ask it to.
</p>

<h2><a name="client_example">Example API Usage</a></h2>
<p>Once you have a running HBase, you probably want a way to hook your application up to it. 
  If your application is in Java, then you should use the Java API. Here's an example of what 
  a simple client might look like.  This example assumes that you've created a table called
  "myTable" with a column family called "myColumnFamily".
</p>

<div style="background-color: #cccccc; padding: 2px">
<code><pre>import org.apache.hadoop.hbase.HTable;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HStoreKey;
import org.apache.hadoop.hbase.HScannerInterface;
import org.apache.hadoop.io.Text;
import java.io.IOException;
import java.util.SortedMap;
import java.util.TreeMap;

public class MyClient {

  public static void main(String args[]) throws IOException {
    // You need a configuration object to tell the client where to connect.
    // But don't worry, the defaults are pulled from the local config file.
    HBaseConfiguration config = new HBaseConfiguration();

    // This instantiates an HTable object that connects you to the "myTable"
    // table. 
    HTable table = new HTable(config, new Text("myTable"));

    // Tell the table that you'll be updating row "myRow". The lockId you get
    // back uniquely identifies your batch of updates. (Note, however, that 
    // only one update can be in progress at a time. This is fixed in HBase
    // version 0.2.0.)
    long lockId = table.startUpdate(new Text("myRow"));

    // The HTable#put method takes the lockId you got from startUpdate, a Text
    // that describes what cell you want to put a value into, and a byte array
    // that is the value you want to store. Note that if you want to store 
    // strings, you have to getBytes() from the string for HBase to understand
    // how to store it. (The same goes for primitives like ints and longs and
    // user-defined classes - you must find a way to reduce it to bytes.)
    table.put(lockId, new Text("myColumnFamily:columnQualifier1"), 
      "columnQualifier1 value!".getBytes());

    // Deletes are batch operations in HBase as well. 
    table.delete(lockId, new Text("myColumnFamily:cellIWantDeleted"));

    // Once you've done all the puts you want, you need to commit the results.
    // The HTable#commit method takes the lockId that you got from startUpdate
    // and pushes the batch of changes you made into HBase.
    table.commit(lockId);

    // Alternately, if you decide that you don't want the changes you've been
    // accumulating anymore, you can use the HTable#abort method.
    // table.abort(lockId);

    // Now, to retrieve the data we just wrote. Just like when we store them,
    // the values that come back are byte arrays. If you happen to know that
    // the value contained is a string and want an actual string, then you 
    // must convert it yourself.
    byte[] valueBytes = table.get(new Text("myRow"), 
      new Text("myColumnFamily:columnQualifier1"));
    String valueStr = new String(valueBytes);
    
    // Sometimes, you won't know the row you're looking for. In this case, you
    // use a Scanner. This will give you cursor-like interface to the contents
    // of the table.
    HStoreKey row = new HStoreKey();
    SortedMap<Text, byte[]> columns = new TreeMap<Text, byte[]>();
    HScannerInterface scanner = 
      // we want to get back only "myColumnFamily:columnQualifier1" when we iterate
      table.obtainScanner(new Text[]{new Text("myColumnFamily:columnQualifier1")}, 
      // we want to start scanning from an empty Text, meaning the beginning of
      // the table
      new Text(""));
      
    // Now, for the actual iteration.
    while(scanner.next(row, columns)) {
      // print out the row we found and the columns we were looking for
      System.out.println("Found row: " + row.getRow() + " with value: " +
              new String((byte[]) columns.get(columns.firstKey())));
    }
  }
}</pre></code>
</div>

<p>There are many other methods for putting data into and getting data out of 
  HBase, but these examples should get you started. See the HTable javadoc for
  more methods. Additionally, there are methods for managing tables in the 
  HBaseAdmin class.</p>

<p>If your client is NOT Java, then you should consider the Thrift or REST 
  libraries.</p>

<h2><a name="related" >Related Documentation</a></h2>
<ul>
  <li><a href="http://hbase.org">HBase Home Page</a>
  <li><a href="http://wiki.apache.org/hadoop/Hbase">HBase Wiki</a>
  <li><a href="http://hadoop.apache.org/">Hadoop Home Page</a>
</ul>

</body>
</html>
